3  Difference in Population Means (8.2.3 - 8.2.6)

When we have two independent random variables which both follow normal distributions, say \(X\sim N(\mu_X,\,\sigma_X)\) and \(Y\sim N(\mu_Y,\,\sigma_Y)\), it is often of interest to find a range of values for the difference between the population means, \(\mu_X\) and \(\mu_Y\). This range of values can be found using a confidence interval for \(\mu_X-\mu_Y\), but we have to be aware of the variances of both distributions, as this can change the way the confidence interval is calculated.

In order to construct confidence intervals, random samples have to be drawn from the underlying distribution. We will use the notation \(n_X\) to denote the size of the sample drawn from \(X\sim N(\mu_X,\,\sigma_X)\) and \(n_Y\) to denote the size of the sample drawn from \(Y\sim N(\mu_Y,\,\sigma_Y)\). Let \(x_1, x_2, \ldots , x_{n_X}\) with sample mean \(\bar{x}\) and sample variance \(s^2_X\) represent the sample from \(X\sim N(\mu_X,\,\sigma_X)\) and \(y_1, y_2, \ldots , y_{n_Y}\) with sample mean \(\bar{y}\) and sample variance \(s^2_Y\) represent the sample from \(Y\sim N(\mu_Y,\,\sigma_Y)\).

3.1 Variances are known and equal (8.2.3)

In the case where the variances of both distributions are known and are equal, that is \(\sigma_X^2=\sigma_Y^2=\sigma^2\), then a \((1-\alpha)\cdot100\%\) confidence interval for the difference in the population means, \(\mu_X-\mu_Y\), is given by,

\[\begin{equation} \tag{8.11} \begin{split} CI_{1-\alpha}(\mu_X-\mu_Y)=\Bigg[&\left(\bar x-\bar y\right)-z_{1-\frac{\alpha}{2}}\cdot\sigma\sqrt{\frac{1}{n_X}+\frac{1}{n_Y}},\\ &\,\,\,\,\,\,\,\,\left(\bar x-\bar y\right)+z_{1-\frac{\alpha}{2}}\cdot\sigma\sqrt{\frac{1}{n_X}+\frac{1}{n_Y}}\,\Bigg] \end{split} \end{equation}\]

You can find the derivation of this result and some additional exercises in Section 8.2.3 of Probability and Statistics with R.

3.2 Variances are known and unequal (8.2.4)

In the case where the variances of both distributions are known and but are unequal, that is \(\sigma_X^2\neq\sigma_Y^2\), then a \((1-\alpha)\cdot100\%\) confidence interval for the difference in the population means, \(\mu_X-\mu_Y\), is given by,

\[\begin{equation} \tag{8.12} \begin{split} CI_{1-\alpha}(\mu_X-\mu_Y)=\Bigg[&\left(\bar x-\bar y\right)-z_{1-\frac{\alpha}{2}}\cdot\sqrt{\frac{\sigma_X^2}{n_X}+\frac{\sigma_Y^2}{n_Y}},\\ &\,\,\,\,\,\,\left(\bar x-\bar y\right)+z_{1-\frac{\alpha}{2}}\cdot\sqrt{\frac{\sigma_X^2}{n_X}+\frac{\sigma_Y^2}{n_Y}}\,\Bigg] \end{split} \end{equation}\]

To see some further examples of using this confidence interval, see Section 8.2.4 of Probability and Statistics with R.

3.3 Variances are unknown and assumed equal (8.2.5)

When random samples have been taken from two normal distributions where the variances are unknown but assumed to be equal, a \((1-\alpha)\cdot100\%\) confidence interval for \(\mu_X-\mu_Y\) is given by,

\[\begin{equation} \tag{8.15} \begin{split} CI_{1-\alpha}(\mu_X-\mu_Y)=\Bigg[&\left(\bar x-\bar y\right)-t_{1-\frac{\alpha}{2};\nu_p}\cdot s_p\sqrt{\frac{1}{n_X}+\frac{1}{n_Y}},\\ &\,\,\,\,\,\,\,\left(\bar x-\bar y\right)+t_{1-\frac{\alpha}{2};\nu_p}\cdot s_p\sqrt{\frac{1}{n_X}+\frac{1}{n_Y}}\,\Bigg] \end{split} \end{equation}\]
  • \(\nu_p\) represents the degrees of freedom for the associated \(t\) distribution. The degrees of freedom can be found as \(\nu_p=n_X+n_Y-2\).

  • \(s_p\) is a pooled estimate of the standard deviation that takes into account the sample sizes, \(n_X\) and \(n_Y\), taken from each distribution. An estimate for the pooled variance can be found using, \[s_p^2=\frac{\left(n_X-1\right)s_X^2+\left(n_Y-1\right)s_Y^2}{n_X+n_Y-2}\]

    where, \(s_X^2=\frac{\sum_{i=1}^{n_X}x_i^2-n_X\bar x^2}{n_X-1}\) and \(s_Y^2=\frac{\sum_{i=1}^{n_Y}y_i^2-n_Y\bar y^2}{n_Y-1}\).

    Remember to take the square root of the estimated variance to find the estimate of standard deviation.


To see more examples of calculating these confidence intervals, see Section 8.2.5 of Probability and Statistics with R.

3.4 Variances are unknown and unequal (8.2.6)

When random samples have been taken from two normal distributions where the variances, \(\sigma_X^2\) and \(\sigma_Y^2\), are unknown and they are unequal, a \((1-\alpha)\cdot100\%\) confidence interval for \(\mu_X-\mu_Y\) is given by,

\[\begin{equation} \tag{8.16} \begin{split} CI_{1-\alpha}(\mu_X-\mu_Y)=\Bigg[&\left(\bar x-\bar y\right)-t_{1-\frac{\alpha}{2};\nu}\cdot \sqrt{\frac{s_X^2}{n_X}+\frac{s_Y^2}{n_Y}},\\ &\,\,\,\,\,\,\,\left(\bar x-\bar y\right)+t_{1-\frac{\alpha}{2};\nu}\cdot\sqrt{\frac{s_X^2}{n_X}+\frac{s_Y^2}{n_Y}}\,\Bigg] \end{split} \end{equation}\]
  • \(\nu\) represents the degrees of freedom for the associated \(t\) distribution. The degrees of freedom can be found using, \[\nu=\frac{\left(\frac{s_X^2}{n_X}+\frac{s_Y^2}{n_Y}\right)^2}{\frac{\left(s_X^2/n_X\right)^2}{n_X-1}+\frac{\left(s_Y^2/n_Y\right)^2}{n_Y-1}}\]

    where, \(s_X^2=\frac{\sum_{i=1}^{n_X}x_i^2-n_X\bar x^2}{n_X-1}\) and \(s_Y^2=\frac{\sum_{i=1}^{n_Y}y_i^2-n_Y\bar y^2}{n_Y-1}\).


See Section 8.2.6 of Probability and Statistics with R for further examples of finding these confidence intervals.